IEEE INFOCOM 2023

Program at a Glance

IEEE INFOCOM 2023

Session Keynote: Opening, Awards, and Keynote

Session Break-1-Day1: Coffee Break Session Lunch-Day1: Conference Lunch Session Dinner-Day1: Chartered Cruise Dinner (for attendees with Full Conference Registrations)

Session A-1: Cloud/Edge Computing 1 Session A-2: Wireless/Mobile Learning Session A-3: Security and Privacy

Session B-1: Federated Learning 1 Session B-2: Federated Learning 2 Session B-3: Federated Learning 3

Session C-1: LoRa and LPWAN Session C-2: Satellite/Space Networking Session C-3: Internet Routing

Session D-1: mmWave 1 Session D-2: mmWave 2 Session D-3: mmWave 3

Session E-1: Video Streaming 1 Session E-2: Video Streaming 2 Session E-3: Video Streaming 3

Session F-1: Datacenter and Switches Session F-2: Memory/Cache Management 1 Session F-3: Internet Measurement

Session G-1: Theory 1 Session G-2: Theory 2 Session G-3: 5G

Session Demo-1: Demo Session 1

Buffer Awareness Neural Adaptive Video Streaming for Avoiding Extra Buffer Consumption

Tianchi Huang (Tsinghua University, China); Chao Zhou (Beijing Kuaishou Technology Co., Ltd, China); Rui-Xiao Zhang, Chenglei Wu and Lifeng Sun (Tsinghua University, China)

Adaptive video streaming has already been a major scheme to transmit videos with high quality of experience~(QoE). However, the improvement of network traffics and the high compression efficiency of videos enable clients to accumulate too much buffer, which might cause colossal data waste if users close the session early before the session ends. In this paper, we consider buffer-aware adaptive bitrate~(ABR) mechanisms to overcome the above concerns. Formulating the buffer-aware rate adaptation problem as multi-objective optimization, we propose DeepBuffer, a deep reinforcement learning-based approach that jointly takes proper bitrate and controls the maximum buffer. To deal with the challenges of learning-based buffer-aware ABR composition, such as infinite possible plans, multiple bitrate levels, and complexity action space, we design adequate preference-driven inputs, separate action outputs, and invent high sample-efficiency training methodologies. We train DeepBuffer with a broad set of real-world network traces and provide a comprehensive evaluation in terms of various network scenarios and different video types. Experimental results indicate that DeepBuffer rivals or outperforms recent heuristics and learning-based ABR schemes in terms of QoE while heavily reducing the average buffer consumption by up to 90\%. Extensive real-world experiments further demonstrate the substantial superiority of DeepBuffer.

Speaker Tianchi Huang (Tsinghua University)

Tianchi Huang (Student Member, IEEE) received the M.E. degree from the Department of Computer Science and Technology, Guizhou University, in 2018. He is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, Tsinghua University, advised by Prof. Lifeng Sun. His research work focuses on the multimedia network streaming, including transmitting streams, and edge-assisted content delivery. He received the Best Student Paper Award from the ACM Multimedia System 2019 Workshop. He has been a Reviewer of IEEE Transactions on Vehicular Technology and IEEE Transactions on Multimedia.

From Ember to Blaze: Swift Interactive Video Adaptation via Meta-Reinforcement Learning

Xuedou Xiao, Mingxuan Yan and Yingying Zuo (Huazhong University of Science and Technology, China); Boxi Liu and Paul Ruan (Tencent Technology Co. Ltd, China); Yang Cao and Wei Wang (Huazhong University of Science and Technology, China)

Maximizing quality of experience (QoE) for interactive video streaming has been a long-standing challenge, as its delay-sensitive nature makes it more vulnerable to bandwidth fluctuations. While reinforcement learning (RL) has demonstrated great potential in optimizing video streaming, recent advances are either limited by fixed models or require enormous data/time for online adaptation, which struggle to fit time-varying and diverse network states. Driven by these practical concerns, we perform large-scale measurements on WeChat for Business's interactive video service to study real-world network fluctuations. Surprisingly, our analysis shows that, compared to time-varying network metrics, network sequences exhibit noticeable short-term continuity, sufficient for few-shot learning requirement. We thus propose Fiammetta, the first meta-RL-based bitrate adaptation algorithm for interactive video streaming. Building on the short-term continuity, Fiammetta accumulates learning experiences through meta-training and enables fast online adaptation to changing network states through few gradient updates. Moreover, Fiammetta innovatively incorporates probing mechanism for real-time monitoring of network states, and proposes an adaptive meta-testing mechanism for seamless adaptation. We implement Fiammetta on a testbed whose end-to-end network follows the real-world WeChat for Business traces. The results show that Fiammetta outperforms prior algorithms significantly, improving video bitrate by 3.6%-16.2% without increasing stalling rate.

Speaker Mingxuan Yan (Huazhong University of Science and Technology)

I'm a Ph.D. student at Huazhong University of Science and Technology and the co-first author of the paper "From Ember to Blaze: Swift Interactive Video Adaptation via Meta-Reinforcement Learning"

RDladder: Resolution-Duration Ladder for VBR-encoded Videos via Imitation Learning

Lianchen Jia (Tsinghua University, China); Chao Zhou (Beijing Kuaishou Technology Co., Ltd, China); Tianchi Huang, Chaoyang Li and Lifeng Sun (Tsinghua University, China)

With the rapid development of the streaming system, a large number of videos need to transcode to multiple copies according to the encoding ladder, which significantly increases the storage overhead than before. This scenario presents new challenges in achieving the balance between better quality for users and less storage cost. In our work, we observe two significant points. The first one is that selecting proper resolutions under certain network conditions can reduce storage costs while maintaining a great quality of experience. The second one is that segment duration is critical, especially in VBR-encoded videos. Considering these points, we propose RDladder, a resolution-duration ladder for VBR-encoded videos via imitation learning. We jointly optimize resolution and duration using neural networks to determine the combination of these two metrics considering network capacity, video information, and storage cost. To get more faithful results, we use over 500 videos, encoded to over 2,000,000 chunks, and collect rear-world network traces for more than 50 hours. We test RDladder in simulation, emulation, and real-world environments under various network conditions, and our method can achieve near-optimal performance. Furthermore, we discuss the influence between the RDladder and the ABR algorithms and summarize some characteristics of the RDladder.

Speaker Lianchen Jia(Tsinghua University)

Second year of PhD, research interests include multimedia transmission

Energy-Efficient 360-Degree Video Streaming on Multicore-Based Mobile Devices

Xianda Chen and Guohong Cao (The Pennsylvania State University, USA)

Streaming (downloading and processing) 360-degree video consumes a large amount of energy on mobile devices, but little work has been done to address this problem, especially considering recent advances in the mobile architecture. Through real measurements, we found that existing systems activate all processor cores during video streaming, which causes high energy consumption, but this is unnecessary since most heavy computations in 360-degree video processing are handled by the hardware accelerators such as hardware decoder, GPU, etc. To address this problem, we propose to save energy by selectively activating the proper processor cluster and adaptively adjusting the CPU frequency based on the video quality. We model the impact of video resolution and CPU frequency on power consumption, and model the impact of video features and network effects on Quality of Experience (QoE). Based on the QoE model and the power model, we formulate the energy and QoE aware 360-degree video streaming problem as an optimization problem. We first present an optimal algorithm which can maximize QoE and minimize energy. Since the optimal algorithm requires future knowledge, we then propose a heuristic based algorithm. Evaluation results show that our heuristic based algorithm can significantly reduce the energy consumption while maintaining QoE.

Speaker Xianda Chen

Xianda Chen received his Ph.D. degree from the Pennsylvania State University and currently works at Microsoft. His research interests include wireless networks, mobile computing, and video streaming.

Session Chair

Tao Li

OmniSense: Towards Edge-Assisted Online Analytics for 360-Degree Videos

Miao Zhang (Simon Fraser University, Canada); Yifei Zhu (Shanghai Jiao Tong University, China); Linfeng Shen (Simon Fraser University, Canada); Fangxin Wang (The Chinese University of Hong Kong, Shenzhen, China); Jiangchuan Liu (Simon Fraser University, Canada)

With the reduced hardware costs of omnidirectional cameras and the proliferation of various extended reality applications, more and more \(360^\circ\) videos are being captured. To fully unleash their potential, advanced video analytics is expected to extract actionable insights and situational knowledge without blind spots from the videos. In this paper, we present OmniSense, a novel edge-assisted framework for online immersive video analytics. OmniSense achieves both low latency and high accuracy, combating the significant computation and network resource challenges of analyzing \(360^\circ\) videos. Motivated by our measurement insights into \(360^\circ\) videos, OmniSense introduces a lightweight spherical region of interest (SRoI) prediction algorithm to prune redundant information in \(360^\circ\) frames. Incorporating the video content and network dynamics, it then smartly scales vision models to analyze the predicted SRoIs with optimized resource utilization. We implement a prototype of OmniSense with commodity devices and evaluate it on diverse real-world collected \(360^\circ\) videos. Extensive evaluation results show that compared to resource-agnostic baselines, it improves the accuracy by \(19.8\%\) - \(114.6\%\) with similar end-to-end latencies. Meanwhile, it hits \(2.0\times\) - \(2.4\times\) speedups while keeping the accuracy on a par with the highest accuracy of baselines.

Speaker Jiangchuan Liu (Simon Fraser University)

Jiangchuan Liu is a Professor in the School of Computing Science, Simon Fraser University, British Columbia, Canada. He is a Fellow of The Canadian Academy of Engineering and an IEEE Fellow. He has served on the editorial boards of IEEE/ACM Transactions on Networking, IEEE Transactions on Multimedia, IEEE Communications Surveys and Tutorials, etc. He was a Steering Committee member of IEEE Transactions on Mobile Computing. He was TPC Co-Chair of IEEE INFOCOM'2021.

Meta Reinforcement Learning for Rate Adaptation

Abdelhak Bentaleb (Concordia University, Canada); May Lim (National University of Singapore, Singapore); Mehmet N Akcay and Ali C. Begen (Ozyegin University, Turkey); Roger Zimmermann (National University of Singapore, Singapore)

The goal of an adaptive bitrate (ABR) scheme is to enable streaming clients to adapt to time-varying network and device conditions to deliver a stall-free viewing experience. Today, most ABR schemes use manually tuned heuristics or learning-based methods. Heuristics are easy to implement but do not always perform well, whereas learning-based methods generally perform well but are difficult to deploy on low-resource devices. To make the most out of both worlds, we develop Ahaggar, a learning-based scheme running on the server side that provides quality-aware bitrate guidance to the streaming clients that run their own heuristics. The novelty behind Ahaggar is the meta reinforcement learning approach taking network conditions, clients' statuses and device resolutions, and streamed content as input features to perform bitrate guidance. Ahaggar uses the emerging CMCD/SD (Common Media Client/Server Data) protocols to exchange the necessary metadata between the servers and clients. Experiments run on a full (open-source) system show that Ahaggar adapts to unseen conditions fast and outperforms its competitors in terms of several viewer experience metrics.

Speaker Roger Zimmermann (National University of Singapore)

Received his M.S. and Ph.D. degrees from the University of Southern California (USC), USA, respectively. He is currently a professor with the Department of Computer Science, National University of Singapore (NUS), Singapore. He is also a lead investigator with the Grab-NUS AI Lab and from 2011-2021 he was Deputy Director with the Smart Systems Institute (SSI) at NUS. He has coauthored a book, seven patents, and more than 350 conference publications, journal articles, and book chapters in the areas of multimedia processing, networking and data analytics. He is a distinguished member of the ACM and a senior member of the IEEE. He recently was Secretary of ACM SIGSPATIAL (2014-2017), a director of the IEEE Multimedia Communications Technical Committee (MMTC) Review Board and an editorial board member of the Springer MTAP journal. He is also an associate editor with IEEE MultiMedia, ACM TOMM and IEEE OJ-COMS. More information can be found at http://www.comp.nus.edu.sg/~rogerz.

Cross-Camera Inference on the Constrained Edge

Jingzong Li (City University of Hong Kong, Hong Kong); Libin Liu (Zhongguancun Laboratory, China); Hong Xu (The Chinese University of Hong Kong, Hong Kong); Shudeng Wu (Tsinghua University, China); Chun Xue (City University of Hong Kong, Hong Kong)

The proliferation of edge devices has pushed computing from the cloud to the data sources, and video analytics is among the most promising applications of edge computing. Running video analytics is compute-intensive and latency-sensitive, as video frames are analyzed by complex deep neural networks (DNNs) which pose severe pressure on resource-constrained edge devices. To resolve the tension between inference latency and resource cost, we present Polly, a cross-camera inference system that enables co-located cameras that have different but overlapping fields of views (FoVs) to share inference results between each other, thus eliminating the redundant inference work for objects in the same physical area. Polly's design solves two basic challenges of cross-camera inference: how to identify overlapping FoVs automatically, and how to share inference results accurately across cameras. Evaluation on NVIDIA Jetson Nano with a real-world traffic surveillance dataset shows that Polly reduces the inference latency by up to 71.6% while achieving almost the same detection accuracy with state-of-the-art systems.

Speaker Jingzong Li (City University of Hong Kong)

AdaptSLAM: Edge-Assisted Adaptive SLAM with Resource Constraints via Uncertainty Minimization

Ying Chen (Duke University, USA); Hazer Inaltekin (Macquarie University, Australia); Maria Gorlatova (Duke University, USA)

Edge computing is increasingly proposed as a solution for reducing resource consumption of mobile devices running simultaneous localization and mapping (SLAM) algorithms, with most edge-assisted SLAM systems assuming the communication resources between the mobile device and the edge server to be unlimited, or relying on heuristics to choose the information to be transmitted to the edge. This paper presents AdaptSLAM, an edge-assisted visual (V) and visual-inertial (VI) SLAM system that adapts to the available communication and computation resources, based on a theoretically grounded method we developed to select the subset of keyframes (the representative frames) for constructing the best local and global maps in the mobile device and the edge server under resource constraints. We implemented AdaptSLAM to work with the state-of-the-art open-source V- and VI-SLAM ORB-SLAM3 framework, and demonstrated that, under constrained network bandwidth, AdaptSLAM reduces the tracking error by 62% compared to the best baseline.

Speaker Ying Chen (Duke University)

Ying Chen is a Ph.D. candidate in the Electrical and Computer Engineering Department at Duke University. She works under the guidance of Prof. Maria Gorlatova in the Intelligent Interactive Internet of Things Lab. Her research interests lie in building resource-efficient and network-adaptive virtual and augmented reality systems.

Session Chair

Sanjib Sur

Who is the Rising Star? Demystifying the Promising Streamers in Crowdsourced Live Streaming

Rui-Xiao Zhang, Tianchi Huang, Chenglei Wu and Lifeng Sun (Tsinghua University, China)

Streamers are the core competency of the crowdsourced live streaming (CLS) platform. However, little work has explored how different factors relate to their popularity evolution patterns. In this paper, we will investigate a critical problem, i.e., \emph{how to discover the promising streamers in their early stage?} . We find that streamers can indeed be clustered into two evolution types (i.e., rising type and normal type), and these two types of streamers will show differences in some inherent properties. Traditional time-sequential models cannot handle this problem, because they are unable to capture the complicated interactivity and extensive heterogeneity in CLS scenarios. To address their shortcomings, we further propose Niffler, a novel heterogeneous attention temporal graph framework (HATG) for predicting the evolution types of CLS streamers. Specifically, through the graph neural network (GNN) and gated-recurrent-unit (GRU) structure, Niffler can capture both the interactive features and the evolutionary dynamics. Moreover, by integrating the attention mechanism in the model design, Niffler can intelligently preserve the heterogeneity when learning different levels of node representations. We systematically compare Niffler against multiple baselines from different categories, and the experimental results show that our proposed model can achieve the best prediction performance.

Speaker Rui-Xiao Zhang (Tsinghua University)

Rui-Xiao Zhang received his B.E and Ph.D degrees from Tsinghua University in 2013 and 2017, repectively. Currently, he is a Post-doctoral fellow in the University of Hong Kong. His research interests lie in the area of content delivery networks, the optimization of multimedia streaming, and the machine learning for systems. He has published more than 20 papers in top conference including ACM Multimedia, IEEE INFOCOM. He also serves as the reviewer for JSAC, TCSVT, TMM, TMC. He has received the Best Student Paper Awards presented by ACM Multimedia System Workshop in 2019.

StreamSwitch: Fulfilling Latency Service-Layer Agreement for Stateful Streaming

Zhaochen She, Yancan Mao, Hailin Xiang, Xin Wang and Richard T. B. Ma (National University of Singapore, Singapore)

Distributed stream systems provide low latency by processing data as it arrives. However, existing systems do not provide latency guarantee, a critical requirement of realtime analytics, especially for stateful operators under burst and skewed workload. We present StreamSwitch, a control plane for stream systems to bound operator latency while optimizing resource usage. Based on a novel stream switch abstraction that unifies dynamic scaling and load balancing into a holistic control framework, our design incorporates reactive and predictive metrics to deduce the healthiness of executors and prescribes practically optimal scaling and load balancing decisions in time. We implement a prototype of StreamSwitch and integrate it with Apache Flink and Samza. Experimental evaluations on real-world applications and benchmarks show that StreamSwitch provides cost-effective solutions for bounding latency and outperforms the state-of-the-art alternative solutions.

Speaker Zhaochen She (National University of Singapore)

Latency-Oriented Elastic Memory Management at Task-Granularity for Stateful Streaming Processing

Rengan Dou and Richard T. B. Ma (National University of Singapore, Singapore)

In a streaming application, an operator is usually instantiated into multiple tasks for parallel processing. Tasks across operators have various memory demands due to different processing logic, e.g., stateful tasks versus stateless tasks. The memory demands of tasks from the same operator could also vary and fluctuate due to workload variability. Improper memory provision will cause some tasks to have relatively high latency, or even unbound latency that can eventually lead to system instability. We found that the task with the maximum latency of an operator has a significant and even decisive impact on the end-to-end latency. In this paper, we present our task-level memory manager. Based on our quantitative modeling of memory and task-level latency, the manager can adaptively allocate optimal memory size to each task for minimizing the end-to-end latency. We integrate our memory management on Apache Flink. The experiments show that our memory management could reduce the E2E latency by more than 46\% (P99) and 40\% (mean) compared to Flink native setting.

Speaker Rengan Dou (National University of Singapore)

Rengan Dou is a Ph.D. student at the School of Computing, National University of Singapore, supervised by Prof. Richard T. B. Ma. He received his bachelor's degree in Computer Science from the University of Science and Technology of China. His research broadly covers resource management on clouds, auto-scaling, and state management on stream systems.

Hawkeye: A Dynamic and Stateless Multicast Mechanism with Deep Reinforcement Learning

Lie Lu (Tsinghua University, China); Qing Li and Dan Zhao (Peng Cheng Laboratory, China); Yuan Yang and Zeyu Luan (Tsinghua University, China); Jianer Zhou (SUSTech, China); Yong Jiang (Graduate School at Shenzhen, Tsinghua University, China); Mingwei Xu (Tsinghua University, China)

Multicast traffic is growing rapidly due to the development of multimedia streaming. Lately, stateless multicast protocols, such as BIER, have been proposed to solve the excessive routing states problem of traditional multicast protocols. However, the high complexity of multicast tree computation and the limited scalability for concurrent requests still pose daunting challenges, especially under dynamic group membership. In this paper, we propose Hawkeye, a dynamic and stateless multicast mechanism with deep reinforcement learning (DRL) approach. For real-time responses to multicast requests, we leverage DRL enhanced by a temporal convolutional network (TCN) to model the sequential feature of dynamic group membership and thus is able to build multicast trees proactively for upcoming requests. Moreover, an innovative source aggregation mechanism is designed to help the DRL agent converge when faced with a large amount of multicast requests, and relieve ingress routers from excessive routing states. Evaluation with real-world topologies and multicast requests demonstrates that Hawkeye adapts well to dynamic multicast: it reduces the variation of path latency by up to 89.5% with less than 12% additional bandwidth consumption compared with the theoretical optimum.

Speaker Lie Lu (Tsinghua University)

Lie Lu is currently pursuing the M.S. degree in Tsinghua Shenzhen International Graduate School, Tsinghua University, China. His research interests include network routing and the application of Artificial Intelligence in routing optimization.

Session Chair

Debashri Roy

Program at a Glance